Data Wrangling with NRE Cleaned Cities

What does this notebook do?

It is the analisys of the cleaned NRW vergabe data.

Conclusions:

The COVID-19 related tenders colected through VERGABE.NRW API are the purchases made by public bodies that were not related to hospitals or health policies in general. This is duo to a federal policy that aimed to centrilize all purchases and avoid competition among cities and Ländern.

This dataset compreends the period between 17.12.2019 and 07.01.2022.

1. Main Problems

This data has two main problems:

1 - There is no stardazation of city names. Instead of using a unique city code, officials just write down the city name or the neighborhood (Stadtteil) the field buyer_name . Making more difficult to analyze the data grouped by city. It is worth mention that other information (latitude, longiude and zip code) have more than one value to each city. The ideal is a one single code number to each buyer entity.

2 - There is data missing. This Dataset has 215 of the 396 cities, which means that 181 cities are missing. Besides, we are not sure that all purchases made by the cities included are in this dataset. Further investigation need to be done. From those cities, we only found COVID-19 related purchases for 35 of them (8% of NRW).

3 - Another huge problem is the lack of information about sellers and values. Only 4% of the procurements registred have the information about seller_name, seller_town, seller_country and result_value and the presence of this information is random. Without this important information, civil society cannot monitor basic things like comparing the value of the purchased item with purchases of the same iten in other towns, check if the company is allowed to governments or see if this company is somehow related to politicians.

4 - There is also missing information in other fields: 7% of the field procedure_typeis blank.

5 - It seems there is not much guidance for the public officials in charge of sending the information to the system. One example is the field result_description, for each some entries has only a few words and for others have detailed description of products and its use.


2. Purchases and types of procurement in the period

Due to the emergency rule setled in the beggining of the pandemic (SEE EXACT DATE), we expected that overall purchases would behave in a specific way, responding to the new needs (sanitizers, masks) and lockdown rules. However, this is not how the data behaved.

All procurements: After the end of fiscal year, in December 2019, there was a new high demand of purchases in April 2020, the highest in the whole period. This can partially be related to the fact that many offices worked in home-office mode, so there were no need to buy many supplies during the following period. However, this does not look like a general trend that deals with the expection rule (See Gr.1) . Regarding the procedure type, the majority of procurements were the open competiton type (Öffentliche Ausschreibung) and all the othe procurement types, including the 5 Ex-Post types, had the same behavior among the type.

COVID-19 related procurements: The purchases of COVID-19 related items followed the trend of the overall purchases, being higher in April 2020 and declining after that (see Gr. 7). However, 55% of the COVID-19 tenders were ex-post types, and only 30% were open competition (Öffentliche Ausschreibung). Besides, when we look at the distribution of types of procedures by time (Gr.9) we see that the ex-post purchases were higher between March and May 2021, close to the 2021's German Summer, when the conuntry was a bit opener.


3. What was bought?

According to the result_title, the two mainly itens bought were masks, schnelltests and lollitests for children (see wordcloud). Below there is also a list of public bodyes who purchased covid-19 related itens. I could not identify a interessant outcome of this.


Code:

0. Fixing the df

1.Distribution of tenders per month:

In the beggining of the system, we have a higher value os tenders added, but the amount of tenders go up and down during the period, beeing low at january 2022 partially due to lack of days (we got this data before 15.01.2022)

2. Proportion of types of tender:

According to the OCDE report, we should see at the beggining of the pandemic an increase of the non-competitive procurements. But as we see, both types behave the same way.

3. Information about sellers:

For all ex-post procedures, there is not a single record of result_value, result_seller_name, result_seller_town or result_seller_country

There is no way to know why these 4% of tenders have the data.

Without data of sellers, there is no way we can do monitoring to know, for example, if a company is not allowed to sell to the government

We have found only 92 inputs with the selected terms. Now we are going to look at them:

5. Who is buying?

6.What are they buying?

7.What are they buying?

8.When were they bought?

It doesn't say much.

9.What were the types of procedure used?

55% of the tenders were ex-post type.

Let's see if the ex-post ones were mostly by the beggining of the pandemic

We see that in May 2021 it was the highest rate of purchases made by ex-post procedures. The purchases made in this dataset are not for hospitals, but for government bodies. Particularly, I don't see why july 2021 would be the moment with the most ex-post procurements.